Skip to content

feat(ascend): add top-k top-p sampler#603

Merged
voltjia merged 1 commit into
InfiniTensor:masterfrom
zhangyue207:feat/ascend-top-k-top-p-sampler
Jul 2, 2026
Merged

feat(ascend): add top-k top-p sampler#603
voltjia merged 1 commit into
InfiniTensor:masterfrom
zhangyue207:feat/ascend-top-k-top-p-sampler

Conversation

@zhangyue207

@zhangyue207 zhangyue207 commented May 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add the public TopKTopPSampler base API in src/base/top_k_top_p_sampler.h.
  • Add an Ascend ACLNN implementation in src/native/ascend/ops/top_k_top_p_sampler/kernel.h.
  • Add focused Ascend tests in tests/test_top_k_top_p_sampler.py.
  • Keep the branch rebased onto current upstream master (d67759e).

Motivation

N/A. No issue is linked.

This adds a logits-level Ascend sampler boundary needed by inference pipelines that already prepare top-k and top-p controls outside InfiniOps.

Type of Change

  • feat - new feature / new operator / new platform
  • fix - bug fix
  • perf - performance improvement (no behavioral change)
  • refactor - code restructuring without behavior change
  • test - adding or fixing tests only
  • docs - documentation only
  • build / ci - build system or CI configuration
  • chore - tooling, formatting, or other non-code changes
  • Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Test Results on Supported Platforms

Platform Built pytest Result Notes / Hardware
NVIDIA CI queued CI queued Re-run triggered on head 5e36a1d.
Iluvatar CI queued CI queued Re-run triggered on head 5e36a1d.
MetaX CI queued CI queued Re-run triggered on head 5e36a1d.
Cambricon CI queued CI queued Re-run triggered on head 5e36a1d.
Moore CI queued CI queued Re-run triggered on head 5e36a1d.
Ascend CI queued CI queued Re-run triggered on head 5e36a1d.
Local checks
git diff --check upstream/master...HEAD
clang-format --dry-run --Werror src/base/top_k_top_p_sampler.h src/native/ascend/ops/top_k_top_p_sampler/kernel.h
ruff format --check tests/test_top_k_top_p_sampler.py
ruff check tests/test_top_k_top_p_sampler.py

All commands passed locally after rebasing onto upstream/master d67759e.

GitHub CI re-run
Head SHA: 5e36a1d029c8cd3695cdb60bf8190dc5a9f25f22
Base SHA: d67759e0c59de09087a4e8774c8a5f4db7bd5134
Triggered at: 2026-06-30 06:11 UTC
Status at description update: ruff and matrix generation passed; clang-format in progress; platform unit jobs queued.

Benchmark / Performance Impact

N/A. This PR adds a correctness-tested operator wrapper and does not include a performance benchmark.

Notes for Reviewers

API alignment note:

  • Reference checked: vLLM vllm.v1.sample.ops.topk_topp_sampler.TopKTopPSampler, whose native path receives logits, generators, k, and p, applies top-k/top-p filtering, and samples from logits.
  • InfiniOps base signature: TopKTopPSampler(logits, k, p, out) and operator()(logits, k, p, out).
  • Intentional deviations: InfiniOps uses an output tensor instead of returning sampled ids, does not expose Python generators, and does not return processed logits/logprobs. Temperature scaling remains caller-owned.
  • Current Ascend ACLNN implementation is verified for the deterministic top_k == 1 subset; broader stochastic sampling is left out of this minimal PR.

Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits.
  • Branch name follows <type>/xxx-yyyy-zzzz.
  • Each commit message follows Conventional Commits.
  • Small PR is a single squashable commit.
  • No stray merge commits from master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal.
  • No dead code, commented-out blocks, debug prints, or ownerless TODO entries.
  • No unrelated formatting churn.
  • Public API changes are intentional, documented, and reflected in tests.

General Code Hygiene

  • The code is self-explanatory; comments were added only where the reason is non-obvious.
  • Every modified or added file ends with a single trailing newline.
  • No trailing whitespace, tab/space mixing, or stray BOMs.
  • Identifiers in comments and error messages are wrapped in backticks.
  • All comments and error messages are in English.
  • Comments and error messages are complete sentences unless the language/framework convention says otherwise.

C++ Specific

  • Code follows the Google C++ style used in this repository.
  • clang-format --dry-run --Werror src/base/top_k_top_p_sampler.h src/native/ascend/ops/top_k_top_p_sampler/kernel.h passes.
  • Operator parameter order is inputs, attributes, outputs.
  • No exceptions are thrown.
  • Error message wording was reviewed.
  • Kernel and kernel launcher separation does not apply to this ACLNN wrapper header.
  • Constructor initializer list order matches member declaration order.
  • Blank line rules were reviewed.
  • New operator follows the src/base/<op>.h plus src/native/ascend/ops/<op>/ pattern.
  • No raw new / delete.

Python Specific

  • ruff format --check tests/test_top_k_top_p_sampler.py passes.
  • ruff check tests/test_top_k_top_p_sampler.py passes.
  • Framework-specific conventions are honored where applicable.
  • Function body spacing follows the project Python rules.

Testing

  • GitHub CI re-run was triggered by updating the PR branch.
  • New functionality has matching tests under tests/.
  • Tests use pytest.mark.parametrize and pytest.mark.auto_act_and_assert.
  • Default device parameterization is used.
  • N/A: No bug fix regression test is needed for this feature PR.

Build, CI, and Tooling

  • git diff --check passes.
  • No new runtime dependency was added.
  • No CMake or CI configuration changes remain in this PR.

Documentation

  • N/A: README.md / CONTRIBUTING.md changes are not needed for this small operator addition.
  • New public operator behavior is documented in the base header and tests.
  • N/A: No user-visible breaking change.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
  • No third-party code was added.
  • No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

@zhangyue207 zhangyue207 force-pushed the feat/ascend-top-k-top-p-sampler branch 3 times, most recently from 2444a24 to 2e95fa7 Compare May 29, 2026 05:53
@zhangyue207 zhangyue207 force-pushed the feat/ascend-top-k-top-p-sampler branch from 2e95fa7 to 611cab2 Compare June 2, 2026 02:40
@zhangyue207 zhangyue207 force-pushed the feat/ascend-top-k-top-p-sampler branch from 611cab2 to 5e36a1d Compare June 30, 2026 06:11
@gongchensu

Copy link
Copy Markdown
Contributor

现在看起来好像只支持k=1是吗?

@zhangyue207 zhangyue207 marked this pull request as ready for review July 2, 2026 05:28
@zhangyue207 zhangyue207 requested a review from a team July 2, 2026 05:28
@zhangyue207

Copy link
Copy Markdown
Collaborator Author

现在看起来好像只支持k=1是吗?

看起来是的,我补充一下

@zhangyue207 zhangyue207 force-pushed the feat/ascend-top-k-top-p-sampler branch from 5e36a1d to 2e56167 Compare July 2, 2026 07:12
@gongchensu

Copy link
Copy Markdown
Contributor

现在看起来好像只支持k=1是吗?

看起来是的,我补充一下

Ascend 路径现在还是不会真正 sample,只会选 argmax。
kernel.h (line 108) 调 aclnnTopKTopPSampleGetWorkspaceSize 时传的是 qOptional=nullptr。在CANN 8.5.1 和 CANN 9.0.0 里看了这个 ACLNN 算子的源码/声明:只有传了 q 才会走 QSampleCompute,否则最后是对 softmax 后的候选集合做 ArgMaxAndGather。
所以现在即使用户传 k=10 或 p=0.9,Ascend 也只是先过滤候选,然后仍然选最高 logit 的 token;不会像 CPU 实现那样随机采样。测试也没覆盖到这个,因为 test_top_k_top_p_sampler.py (line 23) 只测了 k=1,reference 也是 torch.argmax。
这版已经修掉了之前“强制 k=1”的表层问题,但 Ascend sampling 语义还没真正修好。要么需要给 ACLNN 传 [batch, vocab] 的 float32 q 随机/采样权重 tensor,要么把这个 op 的语义明确收窄成 greedy/top-k filtered argmax。

@zhangyue207 zhangyue207 force-pushed the feat/ascend-top-k-top-p-sampler branch from 2e56167 to dc7fca9 Compare July 2, 2026 08:15
@zhangyue207

zhangyue207 commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

已处理:

  • Ascend 路径现在会生成并传入 [batch_size, vocab_size] 的 float32 q tensor,不再给 aclnnTopKTopPSampleqOptional=nullptr
  • q 使用 exponential noise 填充,让 ACLNN 走 QSampleCompute 路径,而不是 filtered argmax。
  • 保留了 k != 1 支持;测试仍只覆盖 k=1 的确定性 case,避免随机采样导致 flaky。
  • 额外在 Ascend 容器里做过 k=2 smoke test,固定 logits 下多次运行能采到候选集合里的两个 token,确认不是恒定 argmax。
  • 最新 CI 已全绿。

@voltjia voltjia merged commit 70e9d01 into InfiniTensor:master Jul 2, 2026
24 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants